Data pre-processing is included, where special chars and minimal stop-words are removed
Get relevant columns
cols <- c('recid', 'item_id', 'user_id', 'text')
reviews2.text <- as.data.frame(reviews2.csv[, cols])
The code block to compute the sentiment score at the sentence level. Not executing it because it takes around 10 minutes to generate the sentiment score for each sentence of a review.
The sentiment score is generated using the sentimentr package.
## get sentiment by sentence using sentimentr package
# reviews_sentences <- reviews2.text %>%
# get_sentences(text) %>%
# mutate(sentence_sentiment = sentimentr::sentiment(text)$sentiment)
Load the pre-computed sentiment score by sentence
## read the sentiment analysis result (using sentimentr package)
reviews_sentences <- read.csv('~/Dropbox/Eugenie/data/processed/reviews_sentences.csv')
Get sentiment by reviews by taking the mean of sentences
sentiment_reviews_sentence.mean <- reviews_sentences %>%
group_by(recid) %>%
summarize(sentiment_mean = mean(sentence_sentiment),
sentence_count = n()) %>%
ungroup()
Join the mean sentiment score with other selected columns
cols <- c('recid','item_id','rating','helpful_yes','helpful_total',
'image_count','word_count','brand_repeat',
'incentivized','is_deleted','verified_purchaser')
reviews2.text <- reviews2.csv[,cols]
reviews2.text <- merge(reviews2.text, sentiment_reviews_sentence.mean, by='recid')
reviews2.text[, c('incentivized','sentiment_mean')] %>%
group_by(incentivized) %>%
summarize_all(mean, na.rm = TRUE)
## # A tibble: 2 x 2
## incentivized sentiment_mean
## <fct> <dbl>
## 1 non-incentivized 0.293
## 2 incentivized 0.254
Note: The incentivized sentiment mean is lower, suprisingly. However, the ‘sentimentr’ package is not perfect either.
For example, here is an incentivized review with negative mean sentiment score, yet the content is relatively positive.
knitr::kable(reviews_sentences[reviews_sentences$recid=='100125154',c('text','sentence_sentiment')],
caption = "An Example of Incentivized Review with Positive Content but Negative Sentiment Score", floating.environment="sidewaystable")
| text | sentence_sentiment | |
|---|---|---|
| 218725 | Mpow Mechanical Gaming Keyboard,87 Keys Anti-Ghosting PC Gaming Keyboard with Blue SwitchesUpdate: My dad has been using this keyboard for a while, here is his additional review: Very nice keyboard, nice, great key feel, accurate. | 0.1083333 |
| 218726 | I love the mechanical key response, 4 stars, would have given 5 if the keys light up. | -0.0774597 |
| 218727 | Very nice, HIGHLY recommend!! | 0.9000000 |
| 218728 | My order for the Mpow Mechanical Gaming Keyboard,87 Keys Anti-Ghosting PC Gaming Keyboard with Blue Switches arrived in the mail quickly and timely thanks to Amazon Prime which is well worth the money if you order regularly from Amazon especially for the added free benefits of Amazon music and Amazon Video in addition to the free 2-Day Prime shipping. | 0.9308070 |
| 218729 | Mechanical keyboards have been making a surge in popularity among the tech-literate crowd due to the superior tactile response and feel that a standard keyboard lacks. | 0.2116951 |
| 218730 | For those that spend at least 8 hours a day behind a keyboard, you’d be wise to make an investment in your typing experience. | 0.1020621 |
| 218731 | If you’re looking to make the leap to a mechanical keyboard this year, or maybe just looking to add to your keyboard stable, this a great one!!! | 0.2834734 |
| 218732 | I purchased this keyboard for my dad who is an avid computer gamer. | 0.2080126 |
| 218733 | He really likes that the keyboard is mechanical and it works great!! | 0.4763140 |
| 218734 | He enjoys playing World of Tanks even more with this awesome keyboard. | -0.0635085 |
| 218735 | He really likes that the keyboard is contoured to your hands, its keeps his fingers from getting sore!! | -0.0235702 |
| 218736 | He love the fast responding time of the keys, makes for a better shot and faster tank!!! | 0.4244373 |
| 218737 | Featureso Mechanical Keyboardo Durableo Easy to Useo High QualityWe’ve had this keyboard for a few days now and it's really great for the money. | 0.3883099 |
| 218738 | The keyboard has tactile clicky switches that are mounted on a metal plate that also serves as the top surface of the keyboard, similar to other popular keyboards. | 0.2834734 |
| 218739 | Yes, it makes that satisfying THOCK sound as you type. | 0.5692100 |
| 218740 | The keys appear to be ABS plastic and the legends are printed on top. | 0.2672612 |
| 218741 | They don't appear to have any sort of coating on the keys so the legends may wear down more, but that's not a performance issue. | 0.2598076 |
| 218742 | In the market today, there are many keyboards to choose from, from cheap to real expensive. | -0.3375000 |
| 218743 | This is a win for me, this keyboard is affordable, works great and is very durable.Disclaimer: I was not compensated for this review, however I did receive the product for a discounted price for my honest and true review. | 0.9052020 |
| 218744 | All opinions are my own and not influenced in any way.5 stars – I love this item! | 0.1875000 |
| 218745 | I highly recommend everyone purchase this product!4 stars – I like it, has a few flaws, but I would purchase again.3 stars – The item was just ok, it worked, probably will not buy it again2 stars – There are probably worse products than this, but there are definitely better choices.1 star – I dislike this product, and I wish I had not bought it, I can’t recommend it to others. | 0.8528159 |
| 218746 | I will update my review in the future if I run into any problems with the product or if I find something I think would be of value to the customer. | -0.2783882 |
| 218747 | Thank you for taking the time to read my review, if you found it helpful please give me a “helpful” vote. | 0.7855844 |
As we see, there’s quite a variation in the sentence sentiment in this particular case. Would this be an example for further analysis on the extreme variance within each review?
But it’s also foreseeable that the variation within non-incentivized reviews would be less prounounced since they’re a lot shorter.
Boxplot: rating vs. sentence_sentiment
## fix effect linear model
## Use sentence sentiment score to replce rating
formula.fe <- sentiment_mean ~ incentivized + is_deleted + verified_purchaser
model.fe <- plm(data = reviews2.text, formula = formula.fe, index = c('item_id'), model = 'within')
# get the model summary
summary(model.fe)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = formula.fe, data = reviews2.text, model = "within",
## index = c("item_id"))
##
## Unbalanced Panel: n = 101, T = 29-10134, N = 264016
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -3.29888 -0.19515 -0.02265 0.17020 2.69732
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## incentivizedincentivized -0.0172033 0.0076676 -2.2436 0.02486 *
## is_deleteddeleted 0.0201613 0.0025538 7.8946 2.923e-15 ***
## verified_purchaserverified 0.0190881 0.0025959 7.3531 1.943e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 24635
## Residual Sum of Squares: 24624
## R-Squared: 0.00043385
## Adj. R-Squared: 4.3737e-05
## F-statistic: 38.1826 on 3 and 263912 DF, p-value: < 2.22e-16
cor.test(reviews2.text$rating, reviews2.text$sentiment_mean, method=c("pearson", "kendall", "spearman"))
##
## Pearson's product-moment correlation
##
## data: reviews2.text$rating and reviews2.text$sentiment_mean
## t = 297.47, df = 264014, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4981655 0.5038793
## sample estimates:
## cor
## 0.5010279
Get sentiment by reviews by taking the sum of sentences
sentiment_reviews_sentence.sum <- reviews_sentences %>%
group_by(recid) %>%
summarize(sentiment_sum = sum(sentence_sentiment)) %>%
ungroup()
Join the mean sentiment score with other selected columns
reviews2.text <- merge(reviews2.text, sentiment_reviews_sentence.sum, by='recid')
reviews2.text[, c('incentivized','sentiment_sum')] %>%
group_by(incentivized) %>%
summarize_all(mean, na.rm = TRUE)
## # A tibble: 2 x 2
## incentivized sentiment_sum
## <fct> <dbl>
## 1 non-incentivized 0.614
## 2 incentivized 2.47
Note: The incentivized review sentiment sum is a lot higher than the non-incentivized group. Going back to the suprising finding in section 2.1.1, combined with our previous finding that incentivized reviews could be a lot longer than non-incentivized ones, the higher sum could be explained.
Boxplot: rating vs. sentence_sentiment
## fix effect linear model
## Use sentence sentiment score to replce rating
formula.fe <- sentiment_sum ~ incentivized + is_deleted + verified_purchaser
model.fe <- plm(data = reviews2.text, formula = formula.fe, index = c('item_id'), model = 'within')
# get the model summary
summary(model.fe)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = formula.fe, data = reviews2.text, model = "within",
## index = c("item_id"))
##
## Unbalanced Panel: n = 101, T = 29-10134, N = 264016
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -4.66658 -0.41532 -0.06511 0.34090 11.80856
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## incentivizedincentivized 1.6215649 0.0175074 92.622 < 2.2e-16 ***
## is_deleteddeleted 0.1170905 0.0058311 20.081 < 2.2e-16 ***
## verified_purchaserverified -0.1185700 0.0059273 -20.004 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 134740
## Residual Sum of Squares: 128380
## R-Squared: 0.047215
## Adj. R-Squared: 0.046843
## F-statistic: 4359.33 on 3 and 263912 DF, p-value: < 2.22e-16
cor.test(reviews2.text$rating, reviews2.text$sentiment_sum, method=c("pearson", "kendall", "spearman"))
##
## Pearson's product-moment correlation
##
## data: reviews2.text$rating and reviews2.text$sentiment_sum
## t = 253.22, df = 264014, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4389741 0.4451123
## sample estimates:
## cor
## 0.4420484
Although it’s still significant, the mean sentiment score shows a stronger correlation with ratings.